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Abstract: A test based on tapering is proposed for use in testing a global 
linear hypothesis under a functional linear model. The test statistic is con- 
structed as a weighted sum of squared linear combinations of Fourier coef- 
ficients, a tapered quadratic form, in which higher Fourier frequencies are 
down-weighted so as to emphasize the smooth attributes of the model. A 
formula is Q„ FT = n 1 3~ ^^W^nJ || 2 - Down-weighting by j^ 1 / 2 is 
selected to achieve adaptive optimality among tests based on tapering with 
respect to its "rates of testing," an asymptotic framework for measuring a 
test's retention of power in high dimensions under smoothness constraints. 
Existing tests based on truncation or thresholding are known to have su- 
perior asymptotic power in comparison with any test based on tapering; 
however, it is shown here that high-order effects can be substantial, and 
that a test based on Q® PT exhibits better (non-asymptotic) power against 
the sort of alternatives that would typically be of concern in functional data 
analysis applications. The proposed test is developed for use in practice, 
and demonstrated in an example application. 
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1. Introduction 

The subject of this article is the functional linear model and functional linear 
hypothesis, both cornerstones functional data analysis (FDA) methodologies. 
This model is described as a sample of independent random functions (some- 
times called curves or profiles), which here will be taken to have a common 
one-dimensional domain, (a, b], and real- valued response. To facilitate study of 
asymptotic properties, the present investigation will adopt a replicated func- 
tional linear model, in which each of N response points of dimension P is repli- 
cated n times. In matrix form, the i'th replication is 

dYi(t) = Xj3(t)dt + adei(t), (1.1) 

for i = l,...,n, where Yi(t) = \Yix(t), . . . ,Yijy(t)] T is a functional vector 
of responses on t € (a, 6], and X — [xi, . . . ,x^] T is a N x P "essence" re- 
gressor matrix, assumed of full column-rank, which stores the values of ex- 
planatory variables at which the response functions arc measured. The vector 
(3(t) = [/3i (£),... , (3p(t)] T is a P x 1 functional vector of regression coefficients, 
and the e,; are independent and identically distributed functional error vectors; 
each e»(t) = [e» i(t), ■ ■ ■ , £j,iv(i)] T is a vector of independent and identically dis- 
tributed error functions, for which E[ti : k{t)] = and V[ei.k(t)} = 1 for each (i, k) 
and t, and which are stationary on t G (a, b]. The functional linear hypothesis 
is Hq : L T f3(t) = for all t € (a, b] against a general alternative, where L is a 
P x v hypothesis matrix of full column-rank. (Use of the symbol v is to reflect 
the degrees of freedom in a standard test of such a hypothesis in the analogous 
univariate situation.) 

After some initial preprocessing, functional data may typically be repre- 
sented by a discrete, high-dimensional model. Such representations are made 
here using Fourier decomposition, whose advantage in FDA is not only to dis- 
cretize the model but also to decorrelate the error structure and offer mean- 
ingful descriptive summarization. This is demonstrated on an existing data set 
in Section 2. (See also Fan and Lin, 1998, Spitzner, Marron, and Essick, 1998, 
Spitzner and Woodall, 2003, and Spitzner, 2008B for further demonstrations in 
FDA.) Also laid out in that section is how, taking into consideration the specific 
L, data collected under the model (1.1) may be translated to that of the model 

Y nd = Oj+n-Wenj, (1.2) 

for j — 1, . . . ,p„, where p n represents some (high) maximum number of dimen- 
sions to be accounted for at a given n. The statistics Y n j = [Y n ji, . . . , Y n j„] T 
arc i/-dimensional discrete data vectors, each entry of which a linear com- 
bination Fourier coefficients (defined by a distinct column of L) that partly 



D.J. Spitzner/A powerful test based on tapering in FDA 



941 



summarizes the information relevant to the specific null hypothesis. The 6j = 
[Oji, . . . , 0j V ] T are mean vectors, and the e„.j = [e n ji, ■ e nj! ,] T are zero- 
mean, unit-covariance error-vectors such that the e n ,jk are independent across 
k. (Across j, however, small correlations among the e n ,jk are possible.) The 
functional linear hypothesis translates to 

Ho : 9j r = for j = 1, . . . ,p n versus Hi : not Ho, (1-3) 

in which the matrix L has been absorbed into the transformation from (1.1) to 
(1.2). 

A typical assumption made in FDA is that the functional parameter (3(t) in 
(1.1) is somehow "smooth." From an intuitive standpoint, this means that the 
(3i(t), . . . ,(3p(t) are each taken to be a conglomeration of mainly large-scale, 
sweeping shapes, which are represented by low-frequency Fourier components. 
For rigorous analysis, smoothness is expressed more technically in Section 3 
by restricting (3(t) to a Sobolev class. At any rate, a key issue in testing is 
how to exploit the smoothness assumption, so as not to waste statistical power 
attempting to distinguish the "rougher" aspects of the model (i.e., the small- 
scale wiggly shapes). This is especially important for testing in high-dimensions, 
where the vastness of the parameter space requires a careful management of 
power. 

The transformation to the discrete model (1.2) typically assigns smaller j to 
model-components associated with smoother functional attributes. Noting this, 
one would want a test that focuses power primarily on the model's lower-indexed 
components. One class of tests that do this is defined by test statistics of the 
form 

Pre 

Q n = nY,w n , 3 \\Y n ^\\ 2 , (1.4) 
j'=i 

where each < w n j < 1 and w n j — > as j — > oo to emphasize the Y n ,j with 
smaller j. The particular test of interest in this article is defined by the weight 
setting w n ,j = j~ 1/2 , which rewrites (1.4) as Q° PT = j~ 1/2 \\Y n ,j\\ 2 ■ 

This technique of managing power by direct down- weighting, as in (1.4), shall be 
referred to as "tapering." Detailed asymptotic power properties of tests based on 
tapering are deduced in Spitzner (2008 A), some of whose results are reproduced 
in Section 3. There, it is shown that Q® PT manages asymptotic power in an 
optimal way among test statistics of the form (1.4). 

The asymptotic performance criteria used in Spitzner (2008A) and adopted 
here are taken from "rates of testing" theory, a framework articulating the rate 
at which power is retained in high-dimensional testing problems under geometric 
smoothness constraints. Its basic components are laid out in Ingster (1993), 
Spokoiny (1996), Lepski and Spokoiny (1999), Horowitz and Spokoiny (2001), 
and Gayraud and Pouet (2005), among others. Relevant criteria are described 
fully in Section 3. Within this context, the problem of selecting the w n ,j in 
(1.4) for good asymptotic power would be aptly described as constrained rate- 
optimization among tests based on tapering, for which Q„ PT is a solution. 
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There are, however, two types of asymptotic optimality to consider, and it is 
known that any test based on tapering is suboptimal with respect to the type 
that is more relevant for consideration in practice. Specifically, Ingster (1993) 
and Spokoiny (1996) respectively deduce "minimax" and "adaptive-minimax" 
rates, which bound the performance of any test with respect to an adopted 
smoothness geometry. The former rate defines a stronger form of asymptotic 
optimality, but the latter "adaptive" type is the one more practically relevant. 
It is well established that tests based on tapering can achieve Ingster's min- 
imax rate of testing (e.g., Ingster (1993) and Fan, Zhang, and Zhang (2001) 
provide two distinct examples), but Spitzncr (2008 A) deduces the suboptimal- 
ity of any test based on tapering, including that based on Q® PT , with respect 
to Spokoiny's adaptive-minimax criterion. 

Existing tests that are known to achieve adaptive minimaxity are based 
on the alternative test-construction techniques of "truncation" or "threshold- 
ing." Spokoiny (1996) provides an adaptive-minimax test based on thresholding, 
and Fan, Zhang, and Zhang (2001) establish the adaptive minimaxity of Fan's 
(1996) "adaptive Neyman test," a test based on truncation. Details of these 
tests are provided in Section 4. Both have been developed for use in FDA in 
Fan and Lin (1998) and Abramovich et al. (2002). 

Despite its asymptotic suboptimality, the test based on Qn PT 1S demon- 
strated in Section 4 to retain good and even superior power over the adaptive- 
minimax tests above in non-asymptotic, practically realistic FDA settings, il- 
lustrating that the improvements offered by thresholding or truncation, though 
guaranteed, may arise quite slowly asymptotically. That is, the evidence of this 
article suggests that, among tapering, truncation, and thresholding, the most 
powerful tests in typical FDA applications are constructed by tapering! 

A novel aspect of this article is that it highlights a distinction between the 
asymptotic setup of FDA and that of other high-dimensional problems involving 
smoothness constraints, such as goodncss-of-fit testing, within which many of 
the existing high-dimensional tests were first developed. The particular distinc- 
tion has to do with the dependence of the dimensionality parameter, p n , on the 
sample-size parameter n. In non-FDA scenarios, there is typically an explicit 
connection between these two parameters (often p n = n) , whereas in FDA the 
connection is largely hypothetical, and one typically has little or no control over 
the rate at which p n increases. Accordingly, an important concern in FDA is 
the sensitivity of test performance on the rate of p n — » oo. It shall be seen that 
the optimality property of the test based on Q° PT is robust in this regard. 

1.1. Organization 

This paper is organized as follows. Section 2 presents an applied data example, 
which demonstrates the transformation from (1.1) to (1.2) and the test based on 
Qn PT ■ This will provide grounding and intuition for subsequent discussion. The 
setup and relevance of asymptotic analysis in FDA is discussed in that section 
as well. Section 3 defines rates-of-testing criteria and presents the paper's main 



D.J. Spitzner/A powerful test based on tapering in FDA 



943 



theoretical results. Section 4 gives details of several existing high-dimensional 
tests, and reports on empirical comparisons carried out by simulation. Conclud- 
ing discussion appears in Section 5, in which the importance of studying tests 
based on tapering in FDA is further elaborated. 

2. Functional data analysis and its asymptotic framework 

Let us begin discussion with an analysis of the Canadian temperature data of 
Ramsay and Silverman (2005, ch. 13) under a functional linear model. (The data 
are available in a supplemental website to the book.) Subsequently, a general 
asymptotic setup for functional data analysis will be laid out. 

2.1. An example data set 

The Canadian temperature data consist of daily mean-temperature profiles 
across the year at 31 weather stations in three regions of Canada: there are 
Mi = 14 "Atlantic" stations, M 2 = 5 "Pacific" stations and M 3 = 12 "Conti- 
nental" stations. The raw measurements arc displayed in the top panels of Figure 
1. (The original data set analyzed by Ramsay and Silverman also includes three 
stations in an additional "Arctic" region, and one additional Atlantic station, 
at "Schefferville." For the present analysis it makes sense to have set aside the 
Arctic stations since there are so few of them, and the Schefferville station since 
its location is of unusually high latitude relative to the other Atlantic stations.) 
Ramsay and Silverman remark that the region-effects seen in these data are 
"more complex than the constant or even sinusoidal effects that one might ex- 
pect," but note specifically that the Pacific stations tend to have warmer winter 
temperatures and Continental stations tend to have colder winter temperatures, 
while all regions' summer temperatures tend close to the average. The latter ob- 
servations refer to large-scale attributes of the temperature profiles, and it will 
be presumed that these and other large-scale attributes are of primary interest. 
Nevertheless, smaller-scale, "more complex" attributes are not to be ignored; 
we want a systematic way to explore essentially all aspects of the data. Fourier 
decomposition provides just such a technique, one that is furthermore most ap- 
propriate for data such as yearly temperature profiles whose attributes tend to 
be periodic. 

2.2. Fourier decomposition 

The notation used in this analysis will parallel that of Section 1, but, for sim- 
plicity, without the subscript n, thereby ignoring any replication concepts until 
further ideas are laid down. Set M = M\ + Ah + Ah = 31 and denote by Yjfc(t) 
the measurement of the fc'th station at time t within a typical year: the mea- 
surement times are ti, . . . , t r , where t\ is January 1, r = 365, and t\ = I, so that 
ti+i — ti is one day. Next define the Fourier basis functions on (0, 365] accord- 
ing to ipi(t) = 1, ip2j(t) = sm(jn(2t/r - 1)), and ip2j+l(t) = cos(jn(2t/r - 1)) 
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Fig 1. Canadian temperature data for weather stations in the Atlantic, Pacific, and Conti- 
nental regions. The top panels plot average temperatures (t) across the year, and the bottom 
panels plot individual sets of Fourier coefficients Y* k for j = 1, . . . , 5. 



for j = 1,2,... The j'th Fourier coefficient of the fc'th station is then Y* k = 
r _1 Y^i=i Yk{ti)^ij{ti), which gives the j'th coefficient of a multiple regression of 
Yk{t[) across ti, . . . ,t r onto any finite set of regressors ipj(ti) across t\, . . . ,t r . 
Coefficients associated with the first few tpj are shown in the bottom panels of 
Figure 1, centered and scaled for each j so that the mean and sum of squares 
of the displayed Y* k match common values. 

Consider, for instance, that the shapes of 4>i{t) = 1 and ip3(t) = cos(n(2t/r — 
1)) convey interpretations whereby Y* k measures the /c'th station's yearly av- 
erage temperature and Y^ k measures its differential between the winter and 
summer temperatures. With this in mind, observe from the bottom panels of 
Figure 1 that the Y£ k tend to be larger for the Pacific region and smaller for 
the Continental region, while the Y£ k tend to be smaller for the Pacific region 
and larger for the Continental region. This reflects the observations made by 
Ramsay and Silverman: between these two regions, the yearly average temper- 
ature of the Pacific region is warmer (as reflected in the Yj* k ) and there is a 
smaller differential between the winter and summer temperatures (as reflected 
in the Y£ k ). Each of the remaining sets of coefficients describe a distinct at- 
tribute of these data. For instance, the Y£ k describe asymmetries between the 
spring and fall transition periods, and the Y* k with larger j summarize finer 
periodic attributes of the stations' yearly profiles. 

Fourier decomposition of these data is not discussed in Ramsay and Silverman 
(2005) itself, but it is in the book's supplemental internet materials, within which 
there appears the remark: "it was decided that 65 basis functions captured 
enough of the detail in the temperature data . . . ," referring to Fourier basis 
functions. The present analysis will follow this guideline and consider just the 
Fourier coefficients Y* k with j = 1, . . . , 65. 
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Fig 2. Scatterplots of Y^ k and Y^, by latitude, with region- specific fitted regression lines. 
The Atlantic stations are indicated by dots (•), the Pacific stations by circles (o), and the 
Continental stations by asterisks (*). 



2.3. Linear models and testing 

Ramsay and Silverman (2005) treat the temperature profiles by an elementary 
multi-group functional model, Y k ^ g ^{t) = n g (t) + £k( g ,j)(t)i where k = k(g,l) 
indexes the Vth station of region g, n g (t) is the mean temperature profile for the 
g'th region, and £k(g,j) (t) are random errors. Their analysis concludes that there 
are indeed vast differences in temperature profiles among the regions. Consider, 
however, that one would expect latitude to explain some substantial portion of 
the variation in temperature from station to station, and moreover that it is 
surely possible for the dependency of temperature on latitude to change from 
region to region. Accordingly, the model considered here is an extension of the 
multi-group model that incorporates latitude as a covariate in such a way as to 
allow for differences in both intercept and slope across regions. Denoting by Xk 
the latitude of the fc'th station, and writing x = M~ x Y^k=i x k-> the extended 
model is Y k{g ^(t) = (j, g (t) + /3 g (t)(x k{g ^) - x) + ek{g,j)(t), a functional linear 
model with P = 6 regression parameters, n g {t) and f3 g (t) for g = 1,2,3. From 
this, an analogous model is implied for each set of Fourier coefficients, which for 
the j'th set is 

Y k*( g ,l)j = Hgj + PgA x Hg,i) - x) + e k ( g ,i)j. (2.1) 

Component-specific estimates and tests may be carried out under the model 
(2.1) using standard linear-regression methodology. To illustrate, Figure 2 dis- 
plays scatterplots of Fourier coefficients corresponding to j = 1 in the left panel 
and j = 41 in the right panel, with estimated region-specific fitted regression 
lines drawn in. The former panel depicts patterns one would more-or-less expect 
to see in these data. There, the Y^ k are seen to generally decrease as latitude 
increases, as would be expected of measurements of yearly average-temperature. 
Observe also that the Y£ k of the Atlantic and Continental stations follow a com- 
mon trend fairly consistently, whereas those of the Pacific stations fall above 
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Table 1 

P-values for component- specific and global tests comparing the relationships between 
temperature and latitude across regions. The headings "falls on common trend" and "same 

slope" refers to the respective null hypotheses Hq : {fJ. gi j — p-j) + (P gi j — (3j)(x gi — x) = 
(/i 92 j — p.j) + (/3 S2 j — (3j)(x g2 — x) and Ho : /3 gi j = P g2 j, w here (gi, 32) is taken across all 
pairs of distinct regions consistent with the row label. The headings Y* k and Y£ lk indicate 
component- specific tests at j = 1 and j = 41, respectively, each with v numerator degrees of 
freedom. "Global" p-values are simulated from the null distribution of F g i ^ a i. 



Comparison v 


Falls on common trend 
Y* k Y* lk Global 


Same slope 
Y ll Y llk Global 


All regions 2 
Atlantic to Pacific 1 
Atlantic to Continental 1 
Pacific to Continental 1 


<0.001 <0.001 <0.001 
<0.001 0.019 <0.001 
0.289 <0.001 <0.001 
<0.001 <0.001 <0.001 


0.480 <0.001 0.005 
0.881 <0.001 0.031 
0.313 <0.001 0.003 
0.454 0.510 0.180 



that trend. The fitted slopes of all three regions are roughly the same. A very 
different pattern is seen in the scatterplot of the YJf lk . Among those coefficients 
it is seen that the Pacific and Continental stations follow roughly the same 
trends, though shifted slightly, and that those trends arc very different from the 
trend followed by the Atlantic stations: the Y£ lk tend to increase with latitude 
for the former stations but decrease for the latter. 

The pattern associated with the Y£ lk within a yearly temperature profile 
is symmetric with a period about two-and-a-half weeks. Its magnitude is quite 
small, accounting for relative temperature differences of less than one half of one 
degree. Yet our analysis suggests its relationship with latitude may distinguish 
the Atlantic region from the others. 

Formal tests of the patterns noted above may be formulated in terms of 
the parameters of the model (2.1). Corresponding p-values are listed in Ta- 
ble 1. To describe the relevant hypotheses, it will be convenient to define an 
"overall" regression line at each j, REGj(x) = flj + j3j(x — x), where flj = 
Af~ 1 ^2 g Mg =1 fi g j and f3j = M~ 1 J2 g =iMgftgj- Referring to the headings of 
Table I, the null hypothesis labeled "falls on common trend" is Ho : (n gi j — 
+ (A?ij - -x) = (fJ. 32j - flj) + (Pg 2j - Pj)(xg 2 - x) for (gi,g 2 ) C G, 

for which x g = M~ x J2i= s i x k( g .i), where the index-set G consists of all pairs of 
distinct indices among regions indicated in the table's corresponding row-label. 
The null hypothesis labeled "same slope" is H : [3 gi j = (3 g2 j for (51,32) € G. 

Each null hypothesis above may be written as a linear hypothesis Ho : 
L T /3j = 0, where (3j = /i2j, l^3j, Pij, foj, hj\ T and I is a matrix of full 
column-rank that is determined by the specific hypothesis. The associated test 
statistic is a ratio of independent "mean-square" statistics Fj — MSj(i)/MSE_, , 
which follows an F distribution whose non-centrality parameter is zero only 
under Ho, assuming the errors in (2.1) are independent, homoscedastic, and 
Gaussian. (Explicit formulas are provided in Section 2.4; see also, Seber, 2003.) 
Associated degrees of freedom are v = rank X, whose values are indicated in 
Table I, and M-P= 25. 

The p-values in Table I reflect the observations made above on Figure 2. 
Comparing against the standard 0.05 level, the p-values in the column labeled 
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Fig 3. P-values for component- specific tests of "same slope." The null hypotheses are Ho : 
Pgi j — Pg 2 ,ji f or region indices (91,32) indicated above each plot. The vertical axis is on a 
logarithmic scale; reference levels are indicated at a = 0.001 and a = 0.05. 

Y* k under "falls on common trend" separate out the Pacific region as falling off 
of a common trend that is followed by the other regions. Those in the Y* k column 
under "same slope" indicate no evidence for differences in slope. Similarly, the 
column labeled Y£ lk under "same slope" separates out the Atlantic region as 
having a different relationship with latitude than the others. 

There remains the question of how to carry out these tests "globally." That 
is, can the same hypotheses be tested on the whole of the temperature profiles, 
inasmuch as they are represented by 65 sets of Fourier coefficients? This type 
of question is the central concern of this article. A starting point is to consider 
the plots in Figure 3, which charts the p-values from component-specific tests 
of "same slope" across all sets of Fourier coefficients, j = 1, . . . , 65, each panel 
corresponding to a separate pair of regions. The most significant differences in 
slope are reflected in the very small p- values displayed in the two leftmost panels, 
at j = 41. Other p- values are "small" as well, in the sense of falling below the 
0.05 level, even in the rightmost panel, but nowhere near as small as these two 
at j = 41. However, in light of there being 65 test results to examine per panel, 
it is no surprise to find at least a handful of small p-values. Of interest, then, is 
to deduce a single assessment for each panel which combines the p-values across 
all j = 1, . . . , 65. 

Well suited to this task is a test statistic constructed by tapering, in a manner 
similar to (1.4). For the present situation, let us define this statistic as F g i b a i = 
2j=i w jFj' an d s °t the weights to Wj = j^ 1 ^ 2 , paralleling the construction of 
Qn PT ■ By combining the individual test statistics Fj this way, F g i b a i is globally 
sensitive, but it down- weights the influence of the Y* k with larger j, as is desired 
to reflect primary interest in the larger-scale shapes. 

Independence shall be assumed among the Y* k across j (which is justified in 
in Section 2.4), so that the null distribution of F g i b a i is fully defined. Simulated 
p-values (using one million iterations) for global versions of the null hypotheses 
discussed above are listed in the columns of Table 1 labeled "global." Comparing 
against 0.05, those in the "same slope" portion of the table again separate 
out the Atlantic region as having a different relationship with latitude than 
the others, but this conclusion now accounts for essentially all aspects of the 
temperature profiles. 
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2-4- An asymptotic framework for functional data analysis 

The sort of analysis carried out on the Canadian temperature data is now cast 
in the general context described in Section 1. The transformation from the 
continuous model (1.1) to the discrete model (1.2) will be laid out in detail, 
and its properties discussed. Hereafter, the notation will revert back to that of 
the replicated model, in which there is an explicit reference to a sample-size 
parameter, n. 

Regarding the setup for asymptotic analysis, there is some imprecision in 
defining replication strictly according to the model (1.1), for the presence of 
a continuous covariate would make it unrealistic to expect exact duplication 
of the regressor matrix, X, except in carefully designed experiments. Absent 
a continuous covariate, however, many multi-group experiments do allow the 
parameterization M g = N g n, where M g are group-specific sample sizes, and the 
"sample size" n is some common divisor among them. In such cases the matrix X 
would consist entirely of zeros and ones, and N = J2 g N g . The notation adopted 
in the Canadian temperature example is intended to suggest such a multi-group 
experiment; yet, the corresponding replication concept is unrealistic, for if any 
additional weather stations were sampled, one would not expect the latitudes 
of those stations to match any of those in the current data set. 

Such complications notwithstanding, the purpose of introducing the sample- 
size parameter, n, is to manifest the notion that an increase in the amount of 
data collected is coupled with a decrease in error variability. This is apparent in 
expression (1.2), which shows the magnitude of error in the discrete model to 
shrink at the rate n^ 1 / 2 . Some readers might prefer to reinterpret the asymptotic 
formulation used in this article to one in terms of shrinking errors, in which 
case the results presented here would directly translate. Otherwise, a "pure" 
interpretation of replication in the model (1.1) might take n = 1 in an analysis 
of current data, and treat any future replication as entirely hypothetical; or, 
one might employ various conceptual devices, such as sampling covariates from 
a distribution, to modify the model (1.1) and its translation to (1.2) so as to 
make replication more realistic. Despite these possibilities, the perspective taken 
here is that the model (1.1) is entirely adequate for illustrating the key ideas of 
present interest, and any modification would only add technical complications 
that are tangential to them. 

Regarding data collection and Fourier decomposition, the functional mea- 
surements are assumed to have been taken along a dense, finite grid that is 
common to all Yi(t), as in the Canadian temperature data. For t in the do- 
main (a, 6], the points of the grid are taken to be = a + (b — a)l/r for 
I = 1, . . . , r and some fixed, large r (which may not be p n ). The data associ- 
ated with the curve ij,fc(i) are Y^fc^i), . . . , Yik{t r ). (In more general situations 
the grid may change from curve to curve, but to avoid additional complica- 
tion it will be assumed a good approximation to the present setup is available, 
e.g., by interpolating measurements onto a fixed grid.) Set ij>i(t) = 1, ip2j(t) = 
sin(7r/{2(t - a)/(b - a) - 1}), and V>2i+i(*) = cos(7rj{2(i - a)/(b - a) - 1}) for 
j = 1,2,... Writing Y* tj = [Y^i> . . . , Y* jN ] T , the Fourier coefficients, Y* jk , are 
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calculated by the formula 



r 
1=1 



To make the final step to the discrete model (1.2), a linear transformation 
involving L is used to tailor the statistics (2.2) to the linear hypothesis. Set 
H = {L T {X T Xy 1 L}- 1 ' 2 and define the Y nJ in (1.2) according to 



Y n , = ^Y,HL T {X T X)-^X T Yl p (2.3) 

j i=i 

where cr| = V[l^*- fc ]. The remaining objects defining (1.2) are 
1 r 

Bj = y~]HL T l3(ti)i)j(ti) and (2.4) 

n r 

e nj - = — -=^^JfL T (X T X)- 1 X T ej (^)^(^)- 

v t=l Z=l 

An often-appropriate assumption has each a stationary process such that 
ei,k(*j) = Em=-oo 7™ »7i,fc(*i - (6 - a)m/r), for which r?i ifc (a + (6 - a)m/r) 
is, across integer m, a mean-zero independent and identically distributed se- 
quence with finite fourth moment, and J2m=-oo 1 7™ I < 00 • When this assump- 
tion is valid, Theorem 10.3.2.i of Brockwell and Davis (1991) implies that each 
Cov{Y*^ l ,Y*l l ) — > as r — > oo, for j 7^ fc. (Sec also Corollary 3.1.1.* in Sec- 
tion 3, below.) Thus, Fourier decomposition provides a means to decorrclatc 
the functional linear model, while the statistics (2.2) or (2.3) capture its core 
structure. 

In the typical case where the <r| are unknown, the test statistic Q n in (1.4) 
would be replaced by Q n = 7i^j=i ^j'Ik n,j\\ , where Y n j is defined as in the 
right side of (2.3) but with an estimate a\ j substituting for crj. For instance, 
the a\ j may be the usual unbiased estimates 

1 n N 

<i = T^r-^EE^-^) 2 ^ (2-5) 

v > i=i k=i 

where 

= ^(X T X)-'X T Yl j . 

i=l 

This modification led to the statistic F g i b a i in the Canadian temperature ex- 
ample, for which Fj = MS J (L)/MSE J has 

n 

MSj(L) = n- 1 HL T {X T Xy 1 X T Yl J /v and MSEj = 0*4, 
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so that Fgiobai = w jFj = Qnl v - Fan and Lin (1998, sec. 3.4) discuss other 

estimates of cr| based on smoothing across j, which are specialized for the Fourier 
decomposition technique. 

Regarding the relationship between p n and n, the reader should notice that 
p n never appears in the transformation from (1.1) to (1.2), and so these two 
parameters are never actually connected. The parameter p n is constrained by 
the resolution of the grid t\,...,t r , which to avoid numerical error requires 
Pn < r. Moreover, it is possible that p n would be set according to subjective 
modeling assumptions such as an analyst's determination of the number of basis 
functions needed to describe the data, as was made in the Canadian temperature 
example. In Spitzner, Marron, and Essick (1998), p n is set subjectively to avoid 
observed defects in the ability of the ipj to decorrelate the model at larger 
j. At any rate, the justification for taking n — » oo and p n — ► oo is that it 
forms an appropriate abstract conceptualization for repeated measurement of 
functional data in accordance with a global point of view. In particular, p n — > oo 
represents a situation where the grid t\, . . . , t r is to become increasingly dense, 
and if n — > oo as well then potentially all available information about the curve 
model will be captured in the limit. 



3. Rates of testing for the tapering mechanism 



The discussion now turns to rates-of-testing theory and the asymptotic opti- 
mality of Qn PT among tests based on tapering. The more technical aspects of 
this discussion have been omitted, but can be found in Spitzner (2008 A). 

Smoothness constraints are formally defined within rates-of-testing theory as 
a restriction of the functional parameter (3(t) in (1.1) to a smooth-function class. 
In the most general settings, this would be a Besov class, but here it is taken 
to be a Sobolev class, a special case, which is appropriate when working with 
Fourier decompositions. Such constraints may be expressed as a restriction of 
the mean vectors of the discrete model (1.2) to the geometry 



B S ,M = < (0i,02,...) 



a Sobolev ellipsoid of radius M in infinite-dimensional discrete space, where 
M > and s > 1/2 are fixed constants. The notation s = 4s + 1 > 3 shall 
also be used. The bound on the norm in (3.1) models smoothness by restrict- 
ing expression of the higher-indexed 9j, with larger s making the restriction 
stronger. Moreover, Parseval's identity implies that [8\, 62, . . .) € 2? s ,m is equiv- 
alent to the assumption that the corresponding (3(t) in (1.1), assuming the fly- 
arise through (2.4), is an element of a Sobolev ellipsoid in continuous space, 
{(3{t) = \px(t), P P {t)] T : \\ $°\t)dt\\ < M c } for some s c = 1, 2, . . . and 
M c > 0, which are easily determined. (For details and further discussion, see 
Adams and Fournier, 2003). 
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The "rates" in rates-of-testing theory, which characterize test performance, 
are described as follows. Fix s > 1/2, M > 0, and for each n let <j> n = 
(/>n(y ni i, . . . ,Y n ,p n ) be a test of (1.3) such that lim„ P [4> n = 1] < a, for a 
fixed level a £ (0, 1). The notation P# is here used to denote probabilities un- 
der the model (1.2) for a specific 6 = (61,62, ■ ■ •) and fixed n. Rates-of-testing 
criteria are formulated from sequences 5 n — > satisfying 



inf 

eeRi(s„/s* 



1] — > 1 for every 5* — > 0, 



(3.2) 



where 



Hi((J;a,M) 



6 e B. 



s,M 



\ 



3 = 1 



8A\ 2 >5 



(3.3) 



The criterion (3.2) describes the rate at which a gap may shrink between the 
null hypothesis and a class of "distinguishable" alternatives, those the test would 
be able to detect with high power, asymptotically. The better-performing tests 
allow this gap to shrink faster: if for some 6 n — > the criterion (3.2) is satisfied 
for one test, but not another, the former test is preferred. 

Ingstcr's (1993) minimax performance bound states that for no test does any 
S„ = o(n~ 2s / s ) satisfy (3.2), but there is a test (based on tapering) for which 
S n = n~ 2s / s satisfies (3.2). This identifies the rate S^f(s) = n~ 2s / s as minimax 
for the geometry B St M at a specific s. Suppose now that fixed bounds < s* 
are given, and for each s* < s < s* one is to consider a separate sequence 
($n(s)), and set S£ M (s) = {n 2 (loglogn)~ 1 }~ s/s . Spokoiny (1996) establishes 
that for no test is (3.2) satisfied across s* < s < s* if S n (s) = o(S^ M (s)) for 
some such s. It is also shown there is a test (based on thresholding) for which 
§n(s) = $n M ( s ) does satisfy (3.2) across s* < s < s* . This identifies the rates 
S^ M (s) as adaptive-minimax for B s ,m across s* < s < s* . (The optimal tests 
alluded here arc the same mentioned in Section 1 and are described later in 
Section 4.) 

The main technical result for evaluating tests based on tapering rewrites the 
criterion (3.2) in terms of the parameters of the test statistic (1.4). 

Theorem 3.1. Assume the model (1.2) and suppose (Q n ) is a sequence of test 
statistics with each Q n as in (1-4) for associated sequences (w n ,j) and (p n ) such 
that each < w n ,j < 1 and p„ — > 00 as n — > 00. Set S„(p) = w 2 t l + ■ • ■ + iu„ p , 
W n [p) = min-iw^ : j < p}, U n (p,q) = qW n (q) / S n (p) , and U n (p) = U n (p,p). 
Suppose at each n the e n jk o-re independent across k, and 



V 



11 W 



1 \\Y n,j\ 



E<^[ii^-ii 2 ] 



0(S n (p n )). 



(3.4) 



Suppose further that each e n jk is such that E[e n jk] = 0, V^[e n jfc] = 1, E[e„ j k ] 
>c 1 uniformly across j, k, and n, and P[e n jk < —t] > for each t > 0. Let 
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(8 n ) be some positive sequence for which S n — > 0. Fix a and, for each n, let (j)® 
denote the size-a test which rejects the null hypothesis in (1.3) when Q n exceeds 
some critical value. For fixed s > 1/2, M > 0, and <j> n = (fr® the criterion (3.2) 
holds if, and only if, both 

(l.) \m\s\iY>n 2 U n {p n )p~ s < oo and (3-5) 

n — *oo 

(ii.) liminf n 2 U n (p n , <7n)<2V^ > 0, 

n — > oo 

where q n = {5 n /M}~ 1//s . The same conclusion holds if the Y n _j k in Q n are 
replaced with Y n j k (1 + o p (1)), provided Cov(Y 2 jk (1 + o p (1)) , Y 2 jt (1 + o p (1))) — > 

Cov ( Y n,jk> Y n,ji) f° r each 3> k > and L 

Proof. This is Theorem 1 of Spitzner (2008 A). □ 

An important corollary of this theorem establishes its validity under assump- 
tions on the covariance structure of the continuous model (1.1) that would typ- 
ically be made in practice. The case of unknown variances is also treated. 

Corollary 3.1.1. Suppose the model (1.2) derives from the functional linear 
model (1.1) via the transformation (2.3), and assume the notation of Section 
2.4- The conclusions of Theorem 3.1 hold true under the following statement (i) 
and remain true under statement (ii). 

(l) Each Ei in (1.1) is such that e^ k (ti) = X)m=-oo 7™ r) itk (ti-(b-a)m/r), 
for which the coefficients 7 m satisfy X)m=-oo |7ro|| m | 1 ''' 2 < oo ; the %,fe(a + 
(b — a)m/r) are independent and identically distributed across integer m 
with E[{r/i t k(a + (b— a)m/?*)} 4 ] < oo, and the weight sequence (w n j) does 
not increase in j for each n. 

(ii) Q n is replaced by Q n , substituting for a? the usual unbiased estimates 
& n j, defined in (2.5). 

Proof. Under the conditions of statement (i), and assuming p n < r, Theorem 
10.3.2. ii of Brockwcll and Davis (1991) implies that p n Cov(Y* 2 [ , Y* kl ) is uni- 
formly bounded across j,k = l,...,p n with j ^ k. From this comes the prop- 
erty Cov(Y 2 j [ ,Y 2 kl ) < Bj (n 2 p n ) for some B across j,k = 1, . . . ,p n with j ^ k. 
Thus, if (w n ,j) is as indicated, one has 

Pn — 1 Pn V 

n2 J2 ^2wn,jW n , k Cov(Y 2 Jh Y 2 kl ) 

j=l k=j+l 1=1 

Bv Vn ~ X 

< ~~ j) W n,j ^ BvS n (p n ), 

P " 3 = 1 

and so the criterion (3.4) is satisfied. Statement (ii) is readily verified using the 
delta rule to show that E[a~ 2 ] — ► aj 2 , Cov(a~ 2 ,a~ 2 k ) — ► 0, and Cov(Y n ^ k , a~ 2 ) 
- 0, from which it follows that Cov{Y 2 jk , Y 2 ^) ^ Cov(Y 2 jk , Y 2 ^). □ 



D.J. Spitzner/A powerful test based on tapering in FDA 



953 



In Spitzner (2008 A), Theorem 3.1 is applied to characterize and bound the 
rates-of-testing performance of tests based on tapering. The key results most 
relevant to present purposes are summarized as follows. 

Corollary 3.1.2. Assume the notation and conditions of Theorem 3.1. 

(i) Suppose (3.5X) holds and 5 n — + is such that n 2 U n (p n , <7n)<Z^ s ~ 1, 
where q n = {Sn/M}' 1 / 8 . Then, the sequence (S n ) defines a "boundary 
rate" of the test <fi® in the sense that for any sequence (5 n ) the criterion 
(3.2) holds if S n = 0(8 n ) but not if S n = o(S n ). 

(ii) Set 5®(s) = {n 2 (logn)~ 1 }~ s/ ' fi and suppose 1/2 < s* < s* . For no test 

is (3.2) satisfied for S n = 0(5% (s)) across s* < s < s* . Moreover, if 
w n .j = J -1 / 2 , as in Q® PT , and p n is such that {n 2 (logn) _1 } 1 / 3 /p„ is 
bounded and logp„ x logra, then <fiQ satisfies (3.2) for S n = o~Q(s) across 
s* < s < s* . Thus, 6®(s) is an "optimal adaptive rate of testing for the 
tapering mechanism. " 

Proof. Statement (i) and the first assertion of (ii) follow immediately from The- 
orems 2 and 3, respectively, of Spitzner (2008 A). To prove the second assertion 
of (ii), first observe that the specified settings imply {n 2 (log n)^ 1 } 1 ^ /p n is 
bounded, since s > 3, and S„(p n ) >c logp n . It follows that the sequence in 
(3.5. i) has 

n 2 U n (p n )p- s x n 2 p- s /\og Pn = {(logn)/(logp„)} [{^(logn)- 1 )} 1 '* /p, 

which is bounded. Next set S n = 5®(s), so that q n = A/ 1 / s {r7, 2 (logr7,)~ 1 )} 1 / ;i , 
and observe 

n 2 Un(p n ,q n )qn ~ s - n 2 q~ s / \ogp n = Ar~ s/s (\ogn)/(\ogp n ). 

Thus the conditions of statement (i) hold, and 6®(s) is a boundary rate for this 
test. □ 

The first statement of Corollary 3.1.2 is useful to deduce the S n — > such 
that (3.2) holds for a specific test, as is demonstrated in the proof of the second 
statement. The second statement is particularly important in that it establishes 
the concept of adaptive optimality among tests based on tapering, and charac- 
terizes the associated performance bound via the sequence 6^(s). Observe that 
the optimal adaptive rate identified in that statement is slower than Spokoiny's 
adaptivc-minimax rate, 8^ M (s) = o(5®(s)). This of course means there are tests 
that would asymptotically outperform any test based on tapering in the adaptive 
context, such as Spokoiny's (1996) test based on thresholding or Fan's (1996) 
adaptive Neyman test. Nevertheless, within the class of tests based on tapering, 
the test based on Q^ PT , with p n as in Corollary 3.1.2.ii, is adaptively optimal. 
Moreover, although the condition logp n x logn does not allow p n — > oo at 
an arbitrarily fast rate, it nevertheless gives the dimensionality parameter fairly 
wide leeway. This property makes such a test particularly suited for use in FDA. 
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Let us also remark that adaptive-optimality can be established in the same 
manner as in the proof of Corollary 3.1.2. ii for the class of test statistics (1.4) 
with w n j = {j(log j) 7 } -1 / 2 such that 7 < 1. The setting 7 = 0, the special case 
that defines Q° PT , is preferred for simplicity. 

4. Comparisons among high-dimensional tests 

In this section, the various high-dimensional tests alluded to in this article are 
described in detail and evaluated by simulation, with the goal of comparing the 
power-properties of tapering, truncation, and thresholding mechanisms. 

The tests are defined here in the context of the discrete model (1.2), tak- 
ing v = 1, and are compared assuming independent, Gaussian errors with 
known variances. Indeed, most of the available theoretical results are derived 
in this context, including those of Ingstcr (1993) and Spokoiny (1996). (Yet 
Fan and Lin, 1998, and Fan and Huang, 2001, establish robustness of the good 
power-properties of Fan's, 199G, adaptive Neyman test under weaker assump- 
tions.) To reflect the setting v = 1, this section will revise notation to rewrite 
Y n ,j = Yn,j, &n,j = 8n,j > an d e n j = e n j in (1.2). Each test is defined below by 
stating a test statistic; it should be assumed the corresponding test itself rejects 
the null hypothesis when the test statistic exceeds a fixed cutoff. 

4-1- Tests based on tapering 

A test based on tapering that is known to achieve Ingster's minimax rate is 
defined by the test statistic FZZ n = nY^j=i w n,jYnj> with weights given 
by w n j = 1 - j 4s £ 2 /(l + j 2s £n) 2 an d £,n = n~ As / s . Optimal performance of 
this test was first deduced in Fan, Zhang, and Zhang (2001) for a slight vari- 
ation in which the test statistic is expressed as an infinite quadratic form, 
FZZ™ = nJ2f =1 Wn, 3 Y£y However, Spitzner (2008A, ex. 2) clarifies that such 
performance is retained for the finite-sum version, provided n 2 l s = 0(p n ). 

A minimax test based on a more simple quadratic form is studied in Ingster 
(1993). It is defined by the unweighted statistic UWQ n = nY^jLy Y 2 j, an d has 

6^f(s) = n~ 2s / s as a boundary rate, provided p n x n 2 / s . A pitfall of this test 
is that it is extremely sensitive to the rate at which p n increases, which puts 
it at a serious disadvantage in the FDA context. For instance, if p„ x n 2 / 7 
for 7 ^ s it is possible to find sequences S„ — 7i~( 2- *) s / s with t > that 
fail to satisfy the rates-of-testing criterion (3.2). This is a drastic degradation 
in performance, well beyond that incurred to achieve adaptive-minimaxity or 
optimal adaptive performance among tests based on tapering. The power of the 
test based on UWQ n is invariant on the contours of Y^j=i ®n j> a property that 
is sometimes touted as a practical advantage; for present purposes this property 
will be exploited to calibrate the simulation design. 

Note that tests based on FZZ n , FZZ™, and UWQ n each require a precise 
specification of the parameter s to achieve minimax performance. Consequently, 
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since s can rarely be specified exactly, each is difficult to implement in practice. 
In the present exercise, these particular tests should be regarded not as candi- 
dates for practical usage, but as conceptual references against which to compare 
the properties of other tests. 

Two other tests based on tapering will also be evaluated. One is, of course, 
that based on Q° PT , an adaptively optimal test of this class. The other is 
defined by the test statistic CVM n — nY^ZiJ~ 2 ^n,j' which is motivated by 
the classical Cramcr-von Mises statistic of goodness-of-fit testing. The Cramer- 
von Mises statistic is traditionally expressed as in integrated squared-diffcrcncc 
between empirical and hypothesized distribution functions, but it is studied 
in Eubank and LaRiccia (1992) via the representation n X^={ 2 J~ 2 ^Xn 2j-i + 
^ri 2j)i gi ym S r i se to the statistic CVM n studied here. This test is included as 
an example of a widely-used high-dimensional test that exhibits rather poor 
performance in an FDA context. For instance, it can be shown that, due to its 
strong down- weighting of higher- indexed components, its best possible boundary 
rates are S n = n -( 2 -*) s /s for f = 2{1 - s/(s + 3)} > 0. 

4-2. Adaptive-minimax tests based on truncation and thresholding 

Two tests that achieve Spokoiny's adaptive-minimax rate shall be considered, 
one based on truncation and the other on thresholding. The test based on trun- 
cation is Fan's (1996) adaptive Neyman test, whose test statistic is AN n = 
xaaxk=i,..., Pri (N nt k — k)/Vk, for which N Ui k = ™X^=i ^nj- Sometimes this test 
is interpreted via the scheme known as "Neyman's truncation," which describes 
the test statistic by N n j , viewing fc„ as a data-driven diagnostic; to yield 

AN n , k n is that k which maximizes the standardized sums {N n ^ — k)/\k. Other 
choices of k n are discussed in Raynor and Best (1989), Eubank and Hart (1992), 
Eubank and LaRiccia (1992), Inglot and Ledwina (1996), Eubank (2000), Aerts, 
Claeskens, and Hart (2000), Claeskens and Hjort (2004), and elsewhere. Among 
them, the test based on AN n has the most well established rates-of-testing prop- 
erties and compares the most favorably in the empirical-power investigation of 
Spitzner (2006), which is similar to the investigation carried out in what follows. 
It is shown in Fan, Zhang, and Zhang (2001) to achieve Spokoiny's adaptive- 
minimax rate, at least for the case where p n = n. 

The test studied here based on thresholding is introduced in Spokoiny (1996). 
This test is readily applicable under the discrete model (1.2), but its formula- 
tion takes the continuous model (1.1) to have been translated to (1.2) using 
wavelet decomposition rather than Fourier decomposition. For present purposes 
it is unnecessary to describe the complete details of that translation, except 
to note that its organization of component-subscripts uses "wavelet indexing:" 
j = I) denotes the Vth component at the fc'th level of resolution: k = 0, 1, . . . 
andZ = l,...,2 fc ;i(0,l) = l ! i(fc+l,0=i(A; ! + 2 fe ,andi(fc,Z+l)=i(fc,0 + l. 
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The test statistic is constructed in two stages. In the first stage, a class of 
tests indexed by s > 1/2 is constructed as follows. Define 

fe.n(s) 2 k 

HT n (s) = J2 E [ nY li(h,i) - l ] 

k=0 1 = 1 

fe* 2 fc 

+ E E [ nF n,j(fe,i) / {v / ^ y n,j(fc,0 > Cn,fe(s)} -MHT(Cn,fe(s)) 
fc=fc»„(s) + l i=l 

for parameters £:* n (s), fc*, and a "hard-thresholding" parameter £ nj fc(s), where 
Hht{0 = E[ri 2 I{\rj\ > £}] with 77 following a standard-normal distribution. The 
settings applied here take k* = [~log 2 n], k m (s) = \(2s + 1/2) -1 log 2 n\ , and 
£n,fc( s ) = k*n (s) + 8) log 2, where \x\ denotes the smallest integer still 

to exceed x. In the second stage, a range s* < s < s* is assumed to have been 
specified, over which adaptively-optimal performance is to be achieved. The 
HT n (s) corresponding to s in that range are combined into the test statistic is 
HT n = max S)<s<s « _ffT„(s)/c„ iQ (s), where c n , a {s) is the cutoff for a size-a test 
based on HT n (s). (Note there is only a finite number of distinct HT n (s) across 
s* < s < s*.) 

The settings for fc*„(s) and £n,fc(s) given above are not those originally 
specified in Spokoiny (1996), but are modified versions suitable for the non- 
asymptotic context considered here. The original settings are k* n (s) = \(2s + 
l/2)" 1 log 2 (Mn)], where M is as in B St M, and the hard-thresholding param- 
eter is £n,fc( s ) = 4-\/(fc — fc* n (s) + 8) log 2. With these, Spokoiny (1996) estab- 
lishes that each test based on HT n (s) achieves Ingster's minimax rate, and 
the test based on HT n achieves Spokoiny's adaptive-minimax rate. However, 
Abramovich et al. (2002) remark that in the original setting for £ n ,k(s) the 
leading constant of four is "unreasonably high" for "finite sample situations," 
presumably referring to data configurations one would tend to work with in 
practice, and proposed it be replaced with one, as is done here. The original 
setting for k m {s) is problematic in that it depends on M. Abramovich et al. 
(2002) handle this with application-specific, ad hoc adjustments; the setting 
here matches the ratio fc*/fc* n (s) to the limit of what it would be under the 
original setting. 



4-3. An empirical comparison 



The power of the tests described in Sections 4.1 and 4.2 are now compared 
in a simulation exercise involving two classes of high-dimensional alternatives. 
The first class consists of "spiked" alternatives, for which 0j is taken to have 
6j = \f\ for j = jo and 6j = otherwise, where jo is a subscript that indexes 
the class, and A is set to calibrate each alternative so that the power of the 
test based on UWQ n is 0.4. The second class consists of "smooth" alternatives, 
which are parameterized according to 0j = Va{1 — j/{p n + l)} d /c where c 2 = 
Y^lxO), d = {log0.2/log(l - b) - l}/2 for < b < 1, and A is as in the 
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Fig 4. Simulated power for high- dimensional tests against spiked and smooth alternatives. 
Power curves associated with FZZ n are marked by solid lines with circles (o), those with 
UWQn by dotted lines, those with Q® PT by thick solid lines, those with CVM n by dashed 
lines, those with AN n by solid lines with dots (•), those with HT n by solid lines with squares 
(O). Insets display the shapes of example alternatives, with corresponding index values. 



spiked class of alternatives. This formula is derived from the inverse-CDF of the 
beta distributions, and is such that the partial sum ^2j=i @j ^ s approximately 
80% of its value at J = p n when J is approximately 1006% of p n . The class is 
indexed by the parameter b. Examples of each type of alternative are displayed 
as upper-right insets in the panels of Figure 4. 

Each spiked-alternative satisfies the technical property of maximally com- 
pressing non-zero components into higher indices among alternatives with con- 
stant Yl'jLi J 2s @j- Mathematical proofs of minimaxity single out these alterna- 
tives as yielding the minimum power, which is to be maximized (c/. th. 4 of 
Fan, Zhang, and Zhang, 2001, and th. 1 of Spitzncr, 2008A). In other words, 
the class of spiked alternatives represent those alternatives that are the hard- 
est to distinguish. These may be of primary interest in some specialized FDA 
applications, such as those involving PET-fMRI images (cf. Abramovich et ai, 
2002). On the other hand, alternatives in the smooth class are idealized repre- 
sentations of those of primary concern in more typical FDA testing problems. 
For instance, in the Canadian temperature example, interest centers on such 
large-scale attributes as the average yearly temperature and the differential be- 
tween winter and summer temperatures, attributes that are expressed in the 
lower- indexed 0j. The smooth alternatives reflect this and similar situations by 
expressing the departure from the null hypothesis mainly through these low- 
indexed components. 

The simulation design is such that the models examined each have dimension- 
ality p n = 127, which was selected to accommodate seven levels of resolution 
in the wavelet-indexing scheme, at k = 0,1,..., 6. (This value is not atypi- 
cal in FDA applications; e.g., p n = 100 in Fan and Lin, 1998, and p n = 124 
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in Spitzner, Marron, and Essick, 1998.) The sample-size parameter is set to 
n = 64, which was chosen since the statistic HT n then has fc* = 6, so that 
the test is sensitive to all available levels of resolution. 

Simulated power is calculated at twenty alternatives from each of the spiked- 
and smooth-alternative classes. The specific alternatives evaluated are such that 
their corresponding class indices are roughly evenly-spaced across the ranges 
jo = l,...,p n for the spiked class and 0.01 < b < 0.80 for the smooth class, 
including the stated endpoints. 

Some intricacy is required to deduce reasonable specifications for the pa- 
rameters of the Sobolev geometry, B St M, which are needed to construct the 
statistics FZZ n and HT„. For spiked alternatives, the bound in B s ,m is set to 
M = X(p„ + 1), so that the Sobolev norm of the alternative indexed at jo = p n +l 
would have Y^=ij 2s ^ 2 = M at s = 1/2, if such a setting for s were allowed. 
Then fixing M at this value, the parameter s is set at different values for differ- 
ent alternatives: for j = 2, . . . ,p n , the parameter is s = log(p„ + l)/{21ogj }, 
which solves Yl'jLi J 2 ' '@j = for j = 1, s is set to its value at j = 2. 
For smooth alternatives, M is first set to the value Y^j=i j 2s $] obtained with 
s = 1/2 and the 9j defined by indexing at b = 0.81. The parameter s is then set 
numerically at each alternative evaluated, for which b < 0.80, to the value that 
solves V; J 2s 6] M. 

The bounds s* and s*, which are required to construct HT n , are taken as 
the lower and upper values of s calculated in the scheme above, treating the 
two classes of alternatives separately. The selected spiked alternatives yield the 
bounds s* = 0.5008 (at j = 127) and s* = 1.1667 (at jo = 1); the selected 
smooth alternatives yield s* = 0.5017 (at b = 0.80) and s* = 2.4680 (at b = 
0.01). Under these settings, the parameter k* n (s) varies within a very narrow 
range: fc*„(s) =4,5 for spiked alternatives, and fc* n (s) = 3,4,5 for smooth 
alternatives. Consequently, the simulated power of the test based on HT n is 
nearly identical to that of each test based on HT n (s). 

Simulated power curves are displayed in the panels of Figure 4 for tests based 
on FZZ„, UWQ ni Q% PT , CVM n , AN n , and HT n . Test-statistic cutoffs and 
the quantities HHT{£,n,k(s)) of the statistic HT n were calculated by simulation, 
and all tests were carried out at the a = 0.05 level. Every simulation used a 
minimum of 250,000 iterations. The left panel of Figure 4 displays results for the 
spiked alternatives, and the right panel those of the smoothed alternatives. One 
should find that the test based on UWQ n serves well as a basis of comparison 
for the other tests, observing that its simulated power curve is near-constant at 
the value 0.4. 

Examining first the results for spiked alternatives displayed in the left panel, 
the benefit of the test based on HT n against spiked alternatives is glaring. 
Whereas the simulated power curve of every other test follows the same pattern 
of starting out high at low values of jo then dropping sharply and later evening 
out well below 0.4, that of HT n is nearly flat at 0.6, representing a consistent 
50% increase in power above that of the test based on UWQ n . Among the 
remaining tests, the simulated power curves of the tests based on FZZ n , Q° PT , 
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and AN n appear quite similar, and reflect performance that is not altogether 
poor, especially at the lower values of jo. The test based on CVM ni however, 
exhibits exceptionally poor performance. 

The picture changes drastically for the smooth alternatives. In the right panel, 
the considerable advantage of the test based on HT n against spiked alternatives 
disappears; its simulated power, in fact, appears generally low relative to the 
other tests, and the same can be said of the test based on AN n , although 
to a lesser degree. To put it another way, it is the tests based on tapering 
that exhibit superior performance in this context, specifically those based on 
FZZ n and Q® PT . The test based on FZZ n outperforms that based on Q„ PT , 
and this is not altogether surprising given that the former satisfies a stronger 
asymptotic property. Simulated power of the test based on CVM n is still quite 
poor, although not as dismal as it is for spiked alternatives. 

Though the tests based on truncation and thresholding are known to have 
superior asymptotic performance, these empirical results suggest quite clearly 
that the higher-order asymptotic factors wash out slowly, and may very well 
have a substantial influence in data configurations one is likely to encounter in 
practice. All of the tests are shown to be sensitive to the shape of the alternative, 
and the test based on thresholding appears especially sensitive in this regard. 
These observations substantiate a recommendation to the analyst that when 
choosing among high-dimensional test in FDA it is prudent to consider the 
types of departures from the null hypothesis one is most interested in detecting. 
The test based on Q° PT m ay best suit the goals of the analysis, despite its 
asymptotic inferiority to truncation and thresholding, and it likely will in typical 
FDA applications where the primary interest is in the large-scale attributes of 
the model. 

Let us briefly return to an observation made in the Canadian temperature 
example. As has been suggested, the focus on large-scale attributes puts the 
goals of the analysis in line with detecting smooth-shaped alternatives, and, in 
light of the current simulation results, the test based on F g i b a i is best suited to 
that purpose. However, recall that the test rejected Ho on the example data, and 
exploratory analysis indicated the presence of a spiked-shaped alternative. Thus 
we have a practical illustration of a test based on tapering evidently detecting an 
alternative among those hardest to distinguish, thereby further demonstrating 
the global power of the test. 

5. Concluding discussion 

Through an application example and both theoretical and empirical power in- 
vestigations, this article has demonstrated the benefits of the tapering mech- 
anism in FDA, and the test based on Qn PT in particular. It has been shown 
how tests based on tapering may be constructed on a functional linear model, 
using Fourier decomposition to first translate to a high-dimensional discrete 
model. Intuition for the discrete model and tests based on tapering have been 
discussed through an example analysis of the Canadian temperature data. Ap- 
plying criteria from rates-of-testing theory, it has been shown that the weight 
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setting w n j = j -1 / 2 in (1.4), which defines Q^ PT , represents an adaptively 
optimal configuration among tests based on tapering. Moreover, the test based 
on Qn PT retains optimality over a wide range of rates at which p n — ► oo, a 
property that is particularly advantageous in the FDA context. The discussion 
reiterates that any test based on tapering can be outperformed asymptotically 
by tests based on truncation or thresholding. Nevertheless, an empirical inves- 
tigation in a non-asymptotic context has demonstrated that high-order effects 
may be non-negligible in practice, and that the test based on Q^ PT has superior 
power against a class of alternatives that would be of natural interest in many 
FDA applications. 

While this article argues in support of the tapering mechanism on the basis 
of its power properties, there are a number of reasons completely separate from 
power why an analyst would, at the outset, want to restrict himself or herself to 
tests based on tapering. The first is that the test statistic (1.4) may arise through 
a formal Bayesian construction, as a monotone transformation of a posterior null 
probability. This is shown in Spitzncr (2008B), where the rates-of-testing frame- 
work is developed entirely within a Bayesian context. It should be noted that 
there are existing Bayesian constructions of the thresholding mechanism for use 
in estimation, e.g., in Abramovich et al. (2007), which suggest the statistic HT n 
might be viewed as a summary metric applied a Bayesian estimator. This falls 
short of producing a formal Bayesian test based on HT„, however, since it fails 
to represent HT n as a monotone transformation of a posterior null probability. 

Another reason that the tapering mechanism is attractive is because of its 
straightforward intuition. The non-expert, once appreciating the goal to ex- 
ploit smoothness but still retain global power, would more quickly embrace the 
intuition underlying the use of tapering before truncation or thresholding. Ta- 
pering is easy to understand as it directly incorporates all components of the 
model, while explicitly down-weighting the high-indexed ones. The latter mech- 
anisms are more complicated, harder to connect to the goal, and easy to apply 
incorrectly For these reasons, tapering may be the preferred recommendation 
in consulting or interdisciplinary situations, and may offer the best insurance 
for continued correct implementation once the statistician's involvement in a 
project wanes. 

Finally, the test statistic (1.4) is convenient in that it may be treated through 
well-known analytical approximations (see, e.g., Mathai and Provost, 1992, sec. 
4.6), rather than simulation. This may be a trivial advantage in straightforward 
testing problems, given modern computing power, but it can help enormously 
when developing more complicated high-dimensional statistical procedures. For 
instance, Spitzner and Marshall (2009) make critical use of distributional ap- 
proximations of quadratic forms to develop a high-dimensional sequential mon- 
itoring procedure based on tapering. 

The results of this article encourage the use of the test based on Q° PT m 
FDA, and establish a role for the tapering mechanism in ratcs-of-testing theory 
It is hoped that the ideas presented here are also found helpful in clarifying the 
role of smoothness constraints in high-dimensional problems, and in providing 
an accessible methodology with which to exploit them. 
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